Model Selection

Efficient Speech Recognition

# Efficient Speech Recognition

Parakeet Tdt 0.6b V2 Mlx

This is an automatic speech recognition model that has been converted to a version suitable for MLX and can perform inference quickly.

Speech Recognition

Safetensors English

Faster Distil Whisper Large V3.5

Distil-Whisper is a distilled version of the Whisper model, optimized for Automatic Speech Recognition (ASR) tasks, offering faster inference speeds.

Speech Recognition English

Faster Distil Whisper Large V3.5

A CTranslate2 format model converted from Distil-Whisper large-v3.5 for efficient speech recognition

Speech Recognition English

Distil Large V3.5 ONNX

Distil-Whisper is a knowledge-distilled version of OpenAI Whisper-Large-v3, offering superior performance and efficiency.

Speech Recognition

Transformers English

Distil Large V3.5 Ct2

Distil-Whisper is a distilled version of the Whisper model, achieving efficient speech recognition through large-scale pseudo-labeling technology

Speech Recognition English

Distil Large V3.5

Distil-Whisper is a knowledge-distilled version of OpenAI Whisper-Large-v3, achieving efficient speech recognition through large-scale pseudo-label training.

Speech Recognition

Transformers English

Faster Whisper V2 D4

This is an optimized Hebrew and English speech recognition model based on the Whisper model, developed by ivrit.ai.

Speech Recognition Supports Multiple Languages

Distil Large V3

Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English automatic speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.

Speech Recognition English

Parakeet Tdt 1.1b

Parakeet TDT 1.1B is an automatic speech recognition (ASR) model jointly developed by NVIDIA NeMo and Suno.ai, capable of transcribing speech into lowercase English letters.

Speech Recognition English

Faster Distil Whisper Medium.en

This is a version of the distil-whisper/distil-medium.en model converted to CTranslate2 format for efficient speech recognition tasks.

Speech Recognition English

Faster Distil Whisper Large V2

This is a distilled version of the automatic speech recognition (ASR) model based on the Whisper architecture, designed for efficient inference and suitable for English speech-to-text tasks.

Speech Recognition English

Sew D Mid 400k Ft Ls100h

SEW-D-mid is a speech pre-training model developed by ASAPP Research, focusing on automatic speech recognition tasks, achieving a good balance between performance and efficiency.

Speech Recognition

Transformers English

Sew D Tiny 100k Ft Ls100h

SEW-D-tiny is an efficient speech recognition pre-trained model developed by ASAPP Research, focusing on the balance between performance and efficiency.

Speech Recognition

Transformers English

SEW-tiny is a compressed and efficient speech pretraining model developed by ASAPP Research, pretrained on 16kHz sampled speech audio, suitable for various downstream speech tasks.

Speech Recognition

Transformers Supports Multiple Languages

Sew D Base Plus 400k Ft Ls100h

SEW-D-base+ is an efficient speech recognition model developed by ASAPP Research, pre-trained on 16kHz sampled speech audio, and excels on the LibriSpeech dataset.

Speech Recognition

Transformers English

Sew Tiny 100k Ft Ls100h

SEW (Squeezed and Efficient Wav2vec) is a speech recognition pre-trained model developed by ASAPP Research, outperforming wav2vec 2.0 in both performance and efficiency.

Speech Recognition

Transformers Supports Multiple Languages

Sew D Mid K127 400k Ft Ls100h

SEW-D-mid-k127 is an efficient speech recognition pre-trained model developed by ASAPP Research, demonstrating significant improvements in performance and efficiency compared to wav2vec 2.0.

Speech Recognition

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase